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Abstract 


An efficient algorithm for the determination of Bayesian optimal discriminating designs 
for competing regression models is developed, where the main focus is on models with 
general distributional assumptions beyond the “classical” case of normally distributed 
homoscedastic errors. For this purpose we consider a Bayesian version of the Kullback- 


Leibler (KL) optimality criterion introduced by Lopez-Fidalgo et al. (2007). Discretizing 
the prior distribution leads to local KL-optimal discriminating design problems for a 
large number of competing models. All currently available methods either require a large 
computation time or fail to calculate the optimal discriminating design, because they 
can only deal efficiently with a few model comparisons. In this paper we develop a new 
algorithm for the determination of Bayesian optimal discriminating designs with respect 
to the Kullback-Leibler criterion. It is demonstrated that the new algorithm is able to 
calculate the optimal discriminating designs with reasonable accuracy and computational 
time in situations where all currently available procedures are either slow or fail. 


Keyword and Phrases: Design of experiment; Bayesian optimal design; model discrimination; 
gradient methods; model uncertainty; Knllback-Leibler distance 
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1 Introduction 


Although optimal designs can provide a substantial improvement in the statistical accuracy 
without making any additional experiments, classical optimal design theory [see for example 


Pukelsheim (2006); Atkinson et al. (2007)] has been criticized, because it relies heavily on the 
specification of a particular model. In many cases a good design for a given model might be 
inefficient if it is used in a different setup. Most of the literature addressing the problem of model 
uncertainty in the design of experiments can be roughly divided into two parts, where all authors 
assume that a certain class of parametric models is available to describe the relation between the 
predictor and the response. One approach to obtain model robustness is to construct designs 
which allow the efficient estimation of parameters in all models under consideration. This is 
usually achieved by optimizing composite optimality criteria, which are defined as an average 
of the criteria for the different models [see Lauter (1974), Dette (1990); Biedermann et al. 


(2006); Dette et al. (2008)]. Alternatively, one can directly construct designs to discriminate 


between several competing models. An early reference is Stigler (197l| who determined designs 
for discriminating between two nested univariate polynomials by minimizing the volume of the 
confidence ellipsoid for the parameters corresponding to the extension of the smaller model. 
Since this seminal paper several authors have followed this line of research [see for example 


Dette and Haller (1998) or Song and Wong (1999) among others]. A completely different 
approach for the construction of optimal designs for model discrimination was suggested by 


Atkinson and Fedorov (1975a). The corresponding optimality criterion is called T-optimality 


criterion. To be precise, assume that the relation between the response Y and predictor x is 
described by a nonlinear regression model such that 


( 1 . 1 ) 


E[y|a:] = r)(x,9) , Var(y|x) =v 2 (x,8) , 


and that the experimenter considers two rival models, say r/i, 772 , as candidates for the parametric 


form of the mean. Roughly speaking, Atkinson and Fedorov (1975a) assumed homoscedasticity, 


fixed one model, say r/j, and constructed the design such that the sum of squares for a lack of 
fit test against the alternative 77 2 is large. The criterion was extended in several directions. For 


example, Atkinson and Fedorov (1975b) considered the problem of discriminating a selected 


model r]i from a class of other regression models, say { 772 ,... , 77 ^}, v > 2, and Tommasi (2009) 


combined the T-criterion with the approach introduced by Lauter (1974). Ucinski and Bo- 


gacka (2005) remarked that the criterion introduced by Atkinson and Fedorov (1975a) is only 


applicable in the case of homoscedastic errors in the regression model ( 1 . 1 ) and discussed an ex¬ 


tension to the case of heteroscedasticity. More generally, Lopez-Fidalgo et al. (2007) introduced 
a generalization of the T-optimality criterion which is applicable under general distributional 
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assumptions and called KL-optimality criterion. Meanwhile the determination of KL-optimal 


discriminating designs has been discussed by several authors [see Tommasi (2009); Tommasi 


and Lopez-Fidalgo (2010) among others]. 


It is important to note here that the T-optimality criterion and its extensions are local optimal¬ 
ity criteria in the sense of Chernoff (1953), because they require the explicit knowledge of the 
parameters in the model rj\. As a consequence, optimal designs with respect to the T-optimality 


criterion might be sensitive with respect to misspecihcation of the parameters [see Dette et al. 


(2012) for a striking example], A standard approach to obtain robust designs [which was already 
mentioned by|Atkinson and Fedorov (1975a)] is the use of a Bayesian T-optimality criterion. 
This criterion is defined as an expectation of various local T-optimality criteria with respect 
to a prior distribution. Dette et al. (2012) derived some explicit Bayesian T-optimal designs 
for polynomial regression models, but in general these designs have to be found numerically in 
nearly all cases of practical interest. Recently, Dette et al. (2015) pointed out that the numeri¬ 
cal construction of Bayesian T-optimal designs is an extremely difficult optimization problem, 
because - roughly speaking - the Bayesian optimality criterion corresponds to an optimal de¬ 
sign problem for model discrimination with an extremely large number of competing models. 
As a consequence, the commonly used algorithms for the calculation of optimal designs, such 
as exchange-type methods or multiplicative methods and their extensions, cannot be applied 
to determine the Bayesian T-optimal discriminating design in reasonable computational time. 


Dette et al. (2015) proposed a new algorithm for the calculation of Bayesian T-optimal dis¬ 


criminating designs and demonstrated its efficiency in several numerical examples. A drawback 
of this method consists still in the fact that it is only applicable to the “classical” Bayesian 
T-optimality criterion which refers to the nonlinear regression model (1.1) with homoscedastic 
and normally distributed responses, i.e. P i ~ J\f(r](x,9),v 2 (x,9)). 


The purpose of the present paper is to extend the methodology introduced by Dette et al. 


(2015) to regression models with more general distributional assumptions. In Section[2]we will 
introduce a Bayesian KL-optimality criterion which extends the criterion introduced by Lopez- 


Fidalgo et al. (2007) to address for uncertainty in the model parameters. The criterion has 


also been discussed in Tommasi and Lopez-Fidalgo (2010), who considered only two competing 
regression models. The new algorithm is proposed in Section [3] and combines some features of 
the classical exchange type algorithms with gradient methods and quadratic programming. In 
Section [4] we illustrate the applicability of the new method in several examples. In particular, we 
determine optimal discriminating designs with respect to the Bayesian KL-optimality criterion 
in situations where all other methods fail to find the optimal design. Finally, the appendix 
contains a proof of an auxiliary result. 
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2 KL-optimal discriminating designs 


The regression model (1.1) is a special case of a more general model, where the distribution of 


the random variable Y has a density, say f(y,x), and x denotes an explanatory variable, which 
varies in a compact design space X. We assume that observations at different experimental 


conditions are independent. Following Kiefer (1974) we consider approximate designs that are 


defined as probability measures, say £, with finite support. The support points xi,...,Xk of a 
design £ give the locations where observations are taken, while the weights uji, ... ,ujk describe 
the relative proportions of observations at these points. If an approximate design is given and n 
observations can be taken, a rounding procedure is applied to obtain integers n % (i — 1,..., k) 
from the not necessarily integer valued quantities upn such that Yli=i n i = n - 
Assume that the experimenter wants to choose a most appropriate model from a given class, 
say {fa, ... ,/„} of competing models, where fj(y,x,6j ) denotes the density of the j th model 
with respect to a sigma-finite measure, say y. The parameter 0 3 varies in a compact parameter 
space ©j (j = 1,... ,u). The models may contain additional nuisance parameters, which will 
not be displayed in our notation. For two competing models, say fa and fa, we denote by 


( 2 . 1 ) 


fa.fax, 0 t , Bfa = / fa(y,x,9i) log 


fi(y,z,9j 


-fi(dy) 


the Kullback-Leibler distance between fa and fj. If the model fa is assumed to be the “true” 
model with parameter 6i, then Lopez-Fidalgo et al. (2007) defined a local KL-optimal discrim¬ 


inating design for the models /j and fj as a design maximizing the optimality criterion 


( 2 . 2 ) 


- inf / Iij(x,6i,6ij)£(dx). 
J X 


This criterion can now easily be extended to construct optimal discriminating designs for more 


than two competing models. Following Tonmrasi and Lopez-Fidalgo (2010) and Braess and 


Dette (2013) we denote by pij nonnegative weights reflecting the importance of the comparison 
between the the model /j and fj, where /j is assumed as the “true” model. The (symmetrized) 
KL-optimality criterion for more than v > 2 competing models f\,..., /„ is then defined by 


(2.3) KLp(£) = V PijKUfat, Of) = V p itj inf / fafax, 0 U O^dx), 

i l 9ijeQj 


and a design maximizing the criterion (2.3) is called local KLp-optimal discriminating design 
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for the models j \,..., /„. For a design £ we also introduce the notation 


(2.4) 0^(0 = arginf [ I id (x, 9 it 0ij)£(dx), 

OijCOj JX 

Our first result characterizes local KL-optimal discriminating design and will be helpful to check 
the optimality of the numerically constructed designs. Its proof can be obtained by standard 
arguments and is therefore omitted. 


Theorem 2.1 Let 


Assumption 2.1 For each i = 1,... ,is the function fi(-, ■,9i ) is continuously differen¬ 
tiable with respect to the parameter 6£ G ©j, 


be satisfied. A design £* is a local KL P -optimal discriminating design, if and only if there exist 
distributions p* 3 on the sets O*■(£*) defined in (2.4) such that the inequality 


(2.5) 


V Pij / I id (x, e u dij^iddij) < KLp(D 


is satisfied for all x E X. Moreover, there is equality in (2.5) for all support points of the local 
KLp-optimal discriminating design £*. 


If, additionally, 

Assumption 2.2 For any design £ such that KLp(£) > 0 and weight Pij ^ 0 the infima 
in (2.3) are attained at a unique points 9 i} j = 9ij(£) in the interior of the set O j, 

are one-point measures and the left-hand side 


is satisfied, then all measures p*j in Theorem 
of inequality (2.5) simplifies to 

( 2 . 6 ) 


2.1 


^(®, 0=5^ Pi,jhj( x i Kj)- 

ij=1 


Consequently, if £ is not a local KLp-optimal discriminating design, it follows that there exists 
a point x G X such that T(x,£) > KLp(£). 


Note that the criterion (2.3) depends on the unknown parameters 9\,...,9 U , which have to 
be specified by the experimenter for the competing model /],..., ff, respectively. Therefore 
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the criterion is a local one in the sense of Chernoff (1953). It was pointed out by Dette et al. 


(2012) that the optimal designs maximizing the criterion (2.3) are rather sensitive with respect 


to misspecification of these parameters. For this reason we will now propose a Bayesian version 
of the criterion in order to obtain robust discriminating designs for the competing models 

We denote by Vi a prior distribution for the parameter 6i in model /, (i = 1,... ,is) and define 
a Bayesian KL-optimality criterion by 


(2.7) 


KL?(f) = Tptj / KLytf.WW), 
*, 1=1 


ij =i J ^ 


inf / I itj (x,9i,9ij)^(dx)Vi(d9i) 


J X 


Optimal designs maximizing this criterion will be called Bayesian KL-optimal discriminating 


designs throughout this paper. We also note that the criterion (2.7) has been considered before 


by 

Tommasi and Lopez-Fida.lgo 

(2010 

) in the case of two competing regression models. 

It was pointed out by 

Dette et al. 

(2015 

) that the determination of Bayesian optimal discrim 


mating designs with respect to the criterion (2.7) is closely related to the problem of finding 


local optimal discriminating designs for a large class of competing regression models. To be 


precise, we note that in most applications the integral in (2.7) is evaluated by numerical inte¬ 


gration approximating the prior distribution by a measure with finite support. Consequently, 
if the prior distribution V r in the criterion is given by a discrete measure with masses r,;i,... 
at the points Aji,..., A a the criterion in (2.7) can be represented as 


( 2 . 8 ) 


IS p 

KL P (0 = J2J2 Pi ’i Tik n in J n / Ti ’ j 
i,j = l k= 1 


[x,X ik,9 itj )£{dx) 


which is a local KL-optimality criterion of the from (2.3), where the competing models are 
given by {fi(y, x , Ajfc)| k — 1,..., i = l,, u}. The only difference between the criterion 


obtained from the (discrete) Bayesian approach and the criterion (2.3) consists in the fact that 


- due to discretization of the prior distributions V \,..., V v - the criterion (2.8) involves substan¬ 


tially more comparisons of competing models fi(y,x, A^). As a consequence the computation 
of Bayesian KL-optimal discriminating design is computationally very challenging, because for 


each support point of the prior distribution in the criterion (2.8) the inhmum has to be calcu¬ 


lated numerically. In the following section we will propose several new algorithms to address 
this problem. In Section [4] it will be demonstrated that these methods yield very satisfactory 
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results in cases where commonly used algorithms are either very slow or fail to determine the 
Bayesian KL-optimal discriminating design. 


3 Efficient algorithms for Bayesian KL-optimal designs 


In this section we propose several algorithms for the calculation of Bayesian KL-optimal designs, 
which determine the optimal designs with reasonable accuracy and are computationally very 
efficient. As pointed out in Section [2] the Bayesian optimality criterion with a discrete prior 


distribution reduces to a local KL-optimality criterion of the form (2.3) with a large number of 


model comparisons. For this reason we will describe the numerical procedures in this section for 


the criterion (2.3). It is straightforward to extend the algorithms to the Bayesian criterion (2.8) 


and in the following Section [4] we will give some illustrations determining Bayesian KL-optimal 
discriminating designs by the new methods. 

Most of the algorithms proposed in the literature for the calculation of optimal designs are based 
on the fact which was mentioned in the paragraph following Theorem 2.1| More precisely, recall 


the definition of the function T in (2.6) and assume that the design £ is not a Bayesian KL- 


optimal discriminating design. It then follows under Assumption |2.2| that there exists a point 
x G A, such that the inequality 

tt(x,£)>KLp(0 


holds. Lopez-Fidalgo et al. ( 2007[ ) used this property to extend the algorithm of Atkinson and 
Fedorov (Jl975a) to the KL-optimality criterion. In the case of the local KL-optimality criterion 


(2.3) it reads as follows. 


Algorithm 3.1 Let £ 0 denote a given (starting) design and let (a s )fL 0 be a sequence of positive 
numbers, such that Hindoo a s = 0, as = °°> < 00 • F° r s = 0,1,... define 

£s + l (1 ®s£(-£s + 1 ) 5 

where £ s+ i = arg max xe ^ T(a;, £ s ). 

It can be shown that this algorithm yields a sequence of designs (£, s ).seN converging in the sense 
that lim^oo KLp(^ s ) = KLp(£*), where denotes a local KL-optimal discriminating design. 
However, it turns out that the rate of convergence is very slow. In particular, if there are many 
models under consideration, the algorithm is very slow and fails in some models to determine 
the local KL-optimal discriminating design (see our numerical example in Section [4]). One 


reason for these difficulties consists in the fact that Algorithm 3.1 usually yields a sequence of 
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designs with an increasing number of support points. As a consequence the resulting design 
(after applying some stopping criterion) is concentrated on a large set of points. In the case of 
normal distributed responses it is also demonstrated by Braess and Dette (2013) that Algorithm 


3.1 requires a large number of iterations if it is used for the calculation of local KL-optimal 


discriminating designs for more than two competing models. 

Following Dette et al. (2015) we therefore propose an alternative procedure for the calculation of 
local KL-optimal discriminating designs, which separates the maximization with respect to the 
support points and weights in two steps. In the discussion below we will present two methods 
for the calculation of the weights in the second step [see Section 3.1 and 3.2 for details]. 


Algorithm 3.2 Let £o denote a starting design such that KLp(£ 0 ) > 0 and define recursively 
a sequence of designs (£ s ) a =o,i,... as follows: 


(1) Let <S[ s ] denote the support of the design £ s . Determine the set £[ s ] of all local maxima of 
the function \P(x,£ s ) on the design space X and define <S[ s+ i] = «S[ s ] U . 

(2) Define £ = {«S[ s+1 ],u;} as the design supported at «S[ s+ i] (with a normalized vector w 
of non-negative weights) and determine the local KLp-optimal design in the class of all 
designs supported at <S[. s+1 ]. In other words: we determine the vector u;[ s +i] maximizing 
the function 


g(uj) = KL P ({5 [a+ i], w}) = V Pi,j inf V I id (x, 9 U 8 itj )w x 

h3 = 1 ze<S [s+ i] 

(here w x denotes the weights at the point x E <S s+ i). All points in <S[<, +1 ] with vanishing 
components in the vector of weights u;[ s+ i] will be be removed and the new set of support 
points will also be denoted by <S[ s+ i]. Finally the design £ s+ i is defined as the design with 
the set of support points «S[ s+ i] and the corresponding nonzero weights. 


It follows by similar arguments as given in Dette et al. (2015) that the sequence (£ s )s=o,i,... °f 
designs generated by Algorithm |3.2| converges to a local KL-optimal discriminating design. The 
crucial step in this algorithm is the second one, because it requires - in particular if a large 
number of competing models are under consideration - the calculation of numerous inhma. In 
order to address this problem we propose a quadratic programming and a gradient method in 
the following two subsections. 














3.1 Quadratic programming 


Let <S[ s+ i] = {xi,... ,x n } denote the set obtained in the first step of Algorithm 3.2 and recall 
the definition of the Knllback-Leibler distance in (2.1). In Step 2 of Algorithm |3.2| a design 
£ with masses at the points Xi,... ,x n has to be determined such that the function 


v n « 

a{u) = ^Pi,jJ2u k / log | 

i,j= 1 k= 1 ' 


fj (Vi @i,j) 


( y,x k ,9i)dn(y ). 


is maximal, where 
(3.1) 

Define 




/ lncr J 

f fi(y,x k ,9i ) 1 

J ° S 1 

l fj(y,Xk,9 id ) j 


fi(y,x k ,6i ) . 


R *,j = R *,j($ij) = 


fi(y,x k ,6i ) dfj(y, x k , 0 i:j 


fj(y,Xk,0ij ) 


6i,j=8i 




fc=i 


dfj(y,x k ,9ij) fi(y,x k ,9i ) 


00 . 




fi(y, x k ,9ij) 


dp(y) 


E R nxdj , 


k=1 


and consider a linearized version of the function g, that is 

My,xk,0i) 


(3.2) 


Z-' I /* 

= ™ in 5^ ^ w log 

ij=l fc=l lx 




fj (Vi 88 ki 9i,j') 


T nT 


fi(y,x k ,9i)dn(y ) 


Note that the minimum with respect to the parameters CKjj G is achieved for 

R ^> 

where the matrix 12 is defined by 12 = diag(xi,..., u> n ) and u> = (cni,..., 0J n ) T . For the following 
discussion we define by A = {u 6 R n | w* > 0 (i — 1,..., n) Yfi=i u i = 1} th e simplex in M n . 


a i,j 


jT 1 (y)^J i ,j{y)dn(dy) 


hj y 


Lemma 3.3 If Assumptions 2.1 and 2.2 are satisfied, then each maximizer of the function g{- 
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in IS. is a maximizer of g(-) in A and vice versa. Moreover, 


max g (uj) = max g(u>). 

wGA A 


A proof of Lemma 3.3 can be found in the Section [5] With the notations 

fi(y,x k ,9i) 


b m = = ( / log 

we have 


fj(y, Xki Oi,j 


fi(y,x k ,Oi)n(dy) = (Iij(x k ,0i,9ij) 


k=1 


k= 1 


g[u) = b t oj — uj t Q(uj)lj 

where the vector b £ R n and the n x n matrix Q are defined by 

QM = (yf Jlj(,yWuj)J i:j (y)n(dy)^ R f d , 

and b = E^ J=1 p ? ; J bj J -, respectively. If we ignore the dependence of the matrix Q(cn) and 
consider this matrix as fixed for a given matrix Q = diag(cJi,... ,cJ n ), we obtain a quadratic 
programming problem, that is 

(3.3) uj) = —(n T Q(a;) u + b T a; —> max. 

ueA 

This problem can now be solved iteratively substituting each time the solution obtained in the 
previous iteration instead of uj. 


Example 3.4 In this example we illustrate the calculation of the function Q(cu) under several 
distributional assumptions. 


(1) Ucinski and Bogacka (2005) considered the regression model with normal distributed 

(y - v(x,e)) 2 ' 


heteroscedastic errors, that is 

f(y,x,9) = 


a/27tu 2 (x, 6) 


exp 


2v 2 (x, 6) 


where rj(x, 6 ) and v 2 (x, 6) denotes the expectation and variance of the response at exper¬ 
imental condition x. In this case the Kullback-Leibler distance between the two densities 
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ft and fj is given by 


/„(*, e„ <hj) = + + log ' ”* 




VjfaOij) 


v](x,9i j) 

1 - 1, 


v?(aj, 0*) 


and a straightforward calculation gives for the function g in (3.2) the representation 


i,j =i 

where Oj = diag (cni,..., u n ), 


g(uj) = Y min 'n'j J! jili.i iJ <\ iJ + 2cn r R Uj a^ + bfju] , 

. . . ,v '.i ’ ’ 


Sij(x, 9ij) 

.1 


v?(a?, 0i) , 




m 

Ohjj (x i ~, 


+ log 




j 9jj) 


Vi(x,9i) -gj(x,6 id ) _ 

Vj(x,6 id ) 


86 


hj 


k=l,...,n 


Rj,j I 6’i. j (Xk i 6j j ) 


8 h f j(xk,9 


hjj 


b i .j (li j{xki 6ii 6\ t j) 


89 i, 


k=l,...,n 


— ''j, • 


_l_ 1 8s ij j(xk, 9ij) 


89 i. 


0i,j-0i,3/ k=l,...,n 


(3) Lopez-Fidalgo et al. (2007) considered the regression model (1.1) with log-normal distri¬ 


bution with parameters fi{x, 9) and cr 2 (x, 9). This means that the mean and the variance 
are given by 


E[Y] = r]{x,9) = exp| ° ^ +n(x,9) 

Var(y) = v 2 (x,9) = g 2 (x,9){exp{a 2 (x,9)} — 1} , 


respectively, and the density of the response Y is given by 


f{y,x,9) = 


xV2na(x, 9) 


exp < — 


{log(y) ~ K X ,9)Y 

2cr 2 (x, 9) 


In the paper Lopez-Fidalgo et al. (2007) it was shown that the Kullback-Leibler distance 
between two log-normal densities with parameters /r?(x, 9g) and crj{x, 9g) (£ = i,j ) is given 
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by 

(3.4) 

where 


T (r,, a a \ _ I/ s (, T n friM) ~ Hjfadij )] 2 _ 

lij[X,U x ,Uij) 2 “h3 / + <J 2 (x 9) ^ ’ 


Sij(x,9ij) = log 




+ 


a 2 (x,9 i}j )l &i(x, Oi) ’ 


and 


o 


x, Oi) = log [1 + v 2 {x , 9i)/r] 2 (x, 9i)] , 
9i) = log [r)i(x, 9i )] - cr?(x, 9i)/ 2. 


Now a straightforward calculation gives for the function g in (3.2) the representation 


9{u) = o E min \afjJfj ft iJjjaij - 2u T K ijOiij + b^-cu] , 

Z 

1,3 = 1 


where Q* = diag (cui ,... ,u> n ), 

\ 


dOi.i 


’h3 


R 


0i(zi,0i) 

/ fc=l,...,n 

[/^(ajfc, 0*) - /Xj-(aj fc , Sj)] 




cr?(a; fc , 0.) 


1 9i j) 


2 30^- 



fc=l,...,n 


bij ( Ii,j 


3.2 A gradient method 


In this section we describe a specialized gradient method for second step of Algorithm 3.2 
function To be precise we introduce theunctions 

V 

v k (uj ) = PijIij{xk,0i,0%j{u)), k = 1,.. .,n, 
i,j =1 
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where 9ij = 9 it j(oS) is defined in (3.1). Next we iteratively calculate a sequence of vec¬ 
tors (o;( 7 )) 7= o,i,... starting with a vector cu( 0 ) = w (for example equal weights). For = 

• • •, CU( 7 y n ) we determine indices k and k corresponding to max!< fe < n Vk(uJ( 7 )) and min!< fc < n Vk((J( 7 )), 
respectively, and define 


a* = arg max g(cj (7) (a)), 

0<a<c j 


(3.5) 

where the vector oJ( 7 )(a) = (cJ( 7 )y(a),..., cJ( 7 ) ;n (a)) is given by 

oj( 7 ) t i + a if i — k 
W( 7 ),;(«) = ^ U( 7 ),i - ol if i — k 
U( 7 ),i else 

The vector W( 7+ i) of the next iteration is then defined by W( 7+ i) = UJ ( 7 )(a*). ft follows by 


similar arguments as in Dette et al. (2015) that the generated sequence of vectors converges to 


a maximizer of the function g. 


4 Implementation and numerical examples 

In this section we illustrate the new algorithms calculating Bayesian KL-optimal discriminating 
designs for several models with non-normal errors. We begin giving a few more details regarding 
the implementation. 

(1) As pointed out in Section [2] a Bayesian KL-optimality criterion is reduced to a local 
criterion of the form (2.3[) for a large number of model comparisons. For illustration 


purposes, consider the criterion (2.8), where v — 2, p 12 = 1, p 2 l = 0 and the prior for the 
parameter 9\ puts masses r±,... at the points Ai,..., A^. This criterion can be rewritten 


as a local criterion of the form (2.3), i.e. 


(4.1) 


m r. 

kl p( 0 = inf n / lij{x,0i,0ij)£{dx) 

,, i 0: " i H 


where Piy+i = Ti,... ,Piy+i = T£ and all other weights p hJ are 0. The extension of this 
approach to more than two models is easy and left to the reader. 


(2) In Step 1 of Algorithm 3.2 all local maxima of the function \H(x, £ s ) are aded as possible 
support points of the design in the next iteration. In order to avoid the problem of 
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accumulating too many support points we remove in each iteration those points with a 
weight smaller than m 0 ' 25 , where m = 2.2 x 1CT 16 is the working precision R. 


(3) In the implementation of the quadratic programming method for Step 2 of Algorithm 3.2 


(see Section 3.1) we perform only a few iterations such that an improvement compared 
to the starting design is obtained. This speeds up the convergence of the procedure 
substantially without affecting the convergence in all examples under consideration. 


(4) In the implementation of the gradient method for Step 2 of Algorithm 3.2 (see Section 


3.2) we use a linearization of the optimization problem in order to improve the speed of 


the procedure. 

We are now ready to demonstrate the advantages of the new method in several examples 
calculating Bayesian KL-optimal discriminating designs. For the sake of brevity we restrict 
ourselves to the case of non-linear regression models, where the response has a log-normal 


distribution with parameters p(x,6) and a 2 (x,6) as described in Example 3.4 


Example 4.1 Our Erst example refers the problem of determining local KL-optimal designs for 
a situation investigated by Lopez-Fidalgo et al. (2007). Motivated by pharmacokinetic practice 
[see Lindsey et al. (2001); Crawley (2002)] these authors determined local KL-optimal designs 


for two log-normal models with mean functions 
(4.2) 


Vl(x, 0l) = + 01,3^, 


01,2 + X 


V2(x,9 2 ) = 


02,lX 
02,2 + X 


on the interval X = [0.1,5]. They assumed equal and constant variances and considered 
model r/i with parameter 9\ = (1,1,1) as fixed. This corresponds to the choice v = 2 and 
Pi ,2 = 1, P 2 ,i = 0 in the criterion (2.3). In Table [I] we resent the optimal design calculating by 
the new algorithms for various choices of the mean and variance function, that is 


(4.3) 


(1) v\{x,0i) = vl(x,d 2 ) = 1 

(2) al(x,6i) = al(x,6 2 ) = 1 

(3) v 2 (x, di) = exp (rii(x, 0*)) 


All designs have an efficiency that is at least 0.999, and we have used three methods for the 
calculation of the local KL-optimal design. The first procedure is a classical exchange type 
method as proposed by Lopez-Fidalgo et al. (2007). The other methods are the two versions of 


the new Algorithm 3.2 with the modifications described in Section 3.1 (quadratic programming) 


and 3.2| (gradient method). For the case (2) of equal variances in (4.3) the corresponding 
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function in (3.4) simplifies to (pi(x, 6 * 1 ) —/i 2 (ir, 6 ^ 1 , 2 )) 2 • Consequently, one can use the procedure 
for the special case of a normal distributed response developed in Dette et al. (2015), where 
/j,i(x,9i) = log 7 ii(x, 9 i) (i = 1,2), which works significantly faster. It should be noted that the 
calculated designs slightly differ from those in Lopez-Fidalgo et al. (2007). 

In Table [l] we also show the computation time (CPU time in seconds on a standard PC with 
an intel core i7-4790K processor) for the different methods. We observe that the methods 
developed in this paper work substantially faster than the exchange type algorithm proposed in 


Lopez-Fidalgo et al. (2007). For example, the new gradient methods are between 5 and 30 times 


faster, while the quadratic programming approach yield to a procedure which is between 25 
and 120 times faster than the classical exchange type algorithm. In the case of two competing 
models the exchange type algorithm is still finding the local KL-optimal discriminating design 
in a reasonable time, but the difference become more important if a discriminating design has 
to be found for more than two competing models or if a Bayesian KL-optimal design has to be 
determined. Some of these situations are discussed in the following examples. 


case 

KL-opt. design 

AF 

grad 

quad 

( 1 ) 

0.130 2.501 

0.489 0.378 

5.000 

0.133 

7.15 

0.56 

0.06 

( 2 ) 

0.100 1.569 

0.294 0.500 

5.000 

0.206 

2.74 

0.52 

0.01 

(3) 

0.100 1.218 

0.326 0.510 

5.000 

0.164 

10.87 

0.33 

0.08 


Table 1: Local KL-optimal discriminating designs for the models in (4.2). The responses 

(p|. 


see 


are log-normal distributed with different specifications of the mean and variance 
Column 3-5 show the computation time of the different algorithms (exchange type Algorithm 


3.1 (AF) proposed in Lopez-Fidalgo et al. (2007) and Algorithm\3.2\ with a gradient (grad) and 


quadratic programming method (quad) in Step 2). 


Example 4.2 In our second example we calculate Bayesian KL-optimal discriminating design 
for two competing exponential models 

(4.4) rji(x , 6 k) = 6 k ,1 - 6 k ,2 exp(- 6 k, 3 x 01 ’ 4 ); 

r] 2 (x, 62 ) = 6 * 2,1 - 6 * 2,2 exp(— 6 * 2 , 3 x). 


on the interval [0,10], where model ?/i is again fixed. Discriminating designs for these models 
have been determined by Dette et al. (2015) under the assumption of a normal distribution, 
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( 1 ) 



( 2 ) 



Figure 1: The function on the left hand side of inequality (2.5) in the equivalence Theorem 


2.1 for the numerically calculated Bayesian KL-optimal discriminating designs. The competing 


regression models are given in (4.4). 


and we will now investigate how the designs change for the log-normal distributed responses 


with mean and variance specified by (4.3). Following these authors we considered independent 


prior distributions supported at the points 


(4.5) 


Pj + 


a/03(z - 3) 


* = !>•••, 5; .7 = 3,4, 


for the parameters # 1,3 and # 1,4 where ps = 0.8, /14 = 1.5. The corresponding weights at these 
points are proportional (in both cases) to 


(4.6) 


1 


y/2ir ■ 0.3 


exp 


3) 2 


i = 1 ,- - -, 5 


Note that the optimal discriminating designs do not depend on the linear parameters of rp , 
for which we have chosen as 6 * 2,1 = 2 and 02,2 = 1 - 

The Bayesian KL-optimal discriminating designs for log-normal distributed responses are dis¬ 
played in Table [2] for the different specifications of the mean and variance in (4.3). In Figure 
[l] we show the function on the left hand side of inequality (2.5) in the equivalence Theorem 
H3 Comparing the computational times in Table [2] we observe again that using quadratic 


programming in Step 2 of Algorithm T 2 is substantially faster than the gradient method. 

It might be of interest to compare the Bayesian optimal discriminating designs for the various 
log-normal distributed responses with the design for normal distributed responses determined 


in Dette et al. (2015). This design is supported at the £ve(!) points 0.000, 0.452, 1.747, 4.951 


16 



































and 10.000 with masses 0.207, 0.396, 0.292, 0.003 and 0.102, respectively. The efficiencies 


Eff 


U) 

KL, 


,(£«) = 


KL$ 


(C 


(i)) 


snp KLp\r]) 


under misspecihcation of the distribution of the response are depicted in Table [3j For exam¬ 
ple, the efficiency of the design £% calculated under the assumption homoscedastic normal 


distributed responses in the model with log-normal distributed responses in (4.3)(2) is given by 


95.3%. We observe that the Bayesian optimal discriminating designs calculated for normal dis¬ 
tributed responses are rather robust and have good efficiencies for the log-normal distribution. 


(4.3) 

design 

AF 

grad 

quad 

(1) 

0 0.406 1.706 10 

0.186 0.418 0.289 0.107 

298.37 

44.36 

3.7 

(2) 

0 0.374 1.650 10 

0.189 0.397 0.311 0.103 

390.44 

7.39 

2.39 

(3) 

0 0.356 1.604 10 

0.186 0.394 0.313 0.107 

570.45 

39.19 

4.42 


Table 2: Bayesian KL-optimal discriminating designs for the models in (4.4). The responses 
are log-normal distributed with different specifications of the mean and variance - see (4.3). The 
prior distribution is defined by (4.5) and (4.6). Column three and four show the computation 
time of the new Algorithm 3.2 proposed in this paper with a gradient (grad) and quadratic 
programming method (quad) in Step 2. 



(0) 

(1) 

(2) 

(3) 

(0) 

1 

0.978 

0.953 

0.908 

(1) 

0.981 

1 

0.988 

0.966 

(2) 

0.951 

0.987 

1 

0.992 

(3) 

0.923 

0.970 

0.996 

1 


Table 3: Efficiencies of Bayesian KL-optimal discriminating designs for the models in (4.4) 
under different distributional assumptions for the responses. (0): homoscedastic normal distri¬ 
bution; (1) - (3): log-normal distribution with different specifications of the mean and variance 
- see (4731). 
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(4.9) 

design 

AF 

grad 

quad 

( 1 ) 

0.759 67.32 248.6 500 

0.419 0.156 0.233 0.192 

1674.14 

679.52 

48.91 

( 2 ) 

0 58.9 220.6 500 

0.200 0.354 0.247 0.199 

- 

255.03 

33.42 

(3) 

0 33.12 78.0 161.6 215.7 500 

0.279 0.092 0.225 0.003 0.224 0.177 

2382.64 

631.53 

82.33 


Table 4: Bayesian KL-optimal discriminating designs for the competing dose response models 
in (4.7). The responses are log-normal distributed with different specifications of the mean 
and variance - see (4.9). The prior distribution is a uniform distribution on 81 points as 
specified in (4.8). Column three, four and five show the computation time of the exchange 
type algorithm (AF), the new Algorithm 3.2 proposed in this paper with a gradient (grad) and 
quadratic programming method (quad) in Step 2. 


Example 4.3 Our final example refers to the construction of Bayesian KL-optimal discrimi¬ 


nating designs for several dose response curves, which have been recently proposed by Pinheiro 


et al. (2006) for modeling the dose response relationship of a Phase II clinical trial, that is 


Vi( x ,0i) = #i,i + 0 i, 2 x; 

(4.7) rj 2 (x, 6*2 ) = 6 > 2 ,i + 0 2A x{0 2A - x)\ 

r l'ii x i 6*3) = 6*3,1 + 6*3,2^/(6*3,3 + x )] 

Pi{x, Of) = 6*4,1 + 6*4, 2/(1 + exp(6*4,3 - x)/0 AA )\ 


where the designs space (dose range) is given by the interval X = [0,500]. In this reference 
some prior information regarding the parameters for the models is also provided., that is 


0 1 = (60, 0.56), 0 2 = (60, 7/2250, 600), 0 3 = (60, 294, 25), 0 A = (49.62, 290.51,150,45.51). 


Dette et al. (2015) determined Bayesian KL-optimal discriminating designs for these models 
under the assumption of normal distributed responses, where they used p t .j — 1 / 6 , (1 < j < 
i < 4) and they assumed that there exist only uncertainty for the parameter 0 A . We will now 
consider similar problems for log-normal distributed responses, where the prior distribution is 
a uniform distribution at 81 points in M 4 , that is 


(4.8) (49.62 + ci, 290.51 + c 2 ,150 + c 3 ,45.51 + c 4 ) 

with 01 , 02 , 03,04 e {—20,0,45}. Note that we cannot use the prior distribution considered in 
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Dette et al. (2015) because this would yield a negative mean r)i(x,6i). The resulting Bayesian 


optimality criterion (2.8) consist of 246 model comparisons and Bayesain KL-optimal discrim¬ 


inating designs are depicted in Table [4] for the 


cases 


(4.9) 


(1) u?(x,0i) = 1 , i = 1,2, 3,4; 

(2) a?(x,9i) = l,i = l, 2, 3,4; 

(3) v?(x, 9i) = exp (rji(x, 6>*)/100) , i = 1, 2, 3,4. 


All calculated designs have at least efficiency 99.9% and the corresponding plots of the equiv- 

shown in Figure [2] In the models specified by (4.7) all new algorithms 


alence Theorem 2.1 


are 


were able to find the Bayesian KL-optimal discriminating design, where the exchange type 


algorithm failed in the case (4.9) (2). Moreover, in the other cases the new methods are sub¬ 


stantially faster than the exchange type method. For example, the gradient method yields only 
25% — 30% of the computational time, while the quadratic programming approach is about 
30 — 35 times faster. 




( 1 ) 


( 2 ) 



Figure 2: The function on the left hand side of inequality (2.5) in the equivalence Theorem 


2.1 for the numerically calculated Bayesian KL-optimal discriminating designs. The competing 


(4.9) 


regression models are given in (4.7) and the scenarios for log-normal distribution specified in 
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5 Appendix: Proof of Lemma 3.3 


By the construction of the function g we have g{uj) < g{oj) for all u 6 A. Let co* G 
arg max weA g{oj) and define the function 


n 


= Vcufc / log <j ^j Xkl V ^‘\ \ fi(xk,y,di)dfj,(dy) 


k= 1 


fj(x k ,y,0. 


h3J 


If dij = argmin^, ge . is the miminizer of it follows from Assumption 

d 


2.2 


that 


86 


h3 


v itj (e itj ,u*) „ = R iJ (0 ii >* = o, 


and we obtain 




-1 


.Ay)dy) Rij(««K = o 


Inserting this value in (3.2) gives g(u*) = g(u*), i.e. max^gA g(oS) = ma^AsM. Now 


let oj* = argmax weA g(u). From the above equality it follows that a t .j = 0 and therefore 
= 0, that is u* = argmaxr/(a;). □ 
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